Automatically generating hypertext by computing semantic similarity
نویسنده
چکیده
We describe a novel method for automatically generating hypertext links within and between newspaper articles. The method is based on lexical chaining, a technique for extracting the sets of related words that occur in texts. Links between the paragraphs of a single article are built by considering the distribution of the lexical chains in that article. Links between articles are built by considering how the chains in the two articles are related. By using lexical chaining we mitigate the problems of synonymy and polysemy that plague traditional information retrieval approaches to automatic hypertext generation. In order to motivate our research, we discuss the results of a study that shows that humans are inconsistent when assigning hypertext links within newspaper articles. Even if humans were consistent, the time needed to build a large hypertext and the costs associated with the production of such a hypertext make relying on human linkers an untenable decision. Thus we are left to automatic hypertext generation. Because we wish to determine how our hypertext generation methodology performs when compared to other proposed methodologies, we present a study comparing the hypertext linking methodology that we propose with a methodology based on a traditional information retreival approach. In this study, subjects were asked to perform a question-answering task using a combination of links generated by our methodology and the competing methodology. We show combined results for all subjects tested, along with results based on subjects’ experience in using the World Wide Web. We detail the construction of a system for performing automatic hypertext generation in the context of an online newspaper. The proposed system is fully capable of handling large databases of news articles in an efficient manner.
منابع مشابه
Automatically generating hypertext in newspaper articles by computing semantic relatedness
We discuss an automatic method for the construction of hypertext links within and between newspaper articles. The method comprises three steps: determining the lexical chains in a text, building links between the paragraphs of articles, and building links between articles. Lexical chains capture the semantic relations between words that occur throughout a text. Each chain is a set of related wo...
متن کاملExperiments on the automatic construction of hypertexts from texts
The problem of (semi-)automatically turning text into hypertext is one that has been identified as important to the growth and development of hypertext as a way of organising information. In this paper we describe an approach we have developed to semi-automatically generate a hypertext from linear texts. This is based on initially creating nodes and composite nodes composed of “mini-hypertexts”...
متن کاملBuilding hypertext links in newspaper articles using semantic similarity
We discuss an automatic method for the construction of hypertext links within and between newspaper articles. The method comprises three steps: determining the lexical chains in a text, building links between the paragraphs of articles, and building links between articles. Lexical chains capture the semantic relations between words that occur throughout a text. Each chain is a set of related wo...
متن کاملRDFa: Lightweight Semantic Enrichment for Hypertext Content
RDFa is a syntactic format that allows RDF triples to be integrated into hypertext content of HTML/XHTML documents. Although a growing number of methods or tools have been designed attempting at generating or digesting RDFa, comparatively little work has been carried out on finding a generic solution for publishing existing RDF data sets with the RDFa serialisation format. This paper proposes a...
متن کاملRDFa2: Lightweight Semantic Enrichment for Hypertext Content
RDFa is a syntactic format that allows RDF triples to be integrated into hypertext content of HTML/XHTML documents. Although a growing number of methods or tools have been designed attempting at generating or digesting RDFa, comparatively little work has been carried out on finding a generic solution for publishing existing RDF data sets with the RDFa serialisation format. This paper proposes a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997